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Searching for Protein-coding Genes Using Microsatellites in Common 
Carp by Comparing to Zebrafish EST Database 
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2. College of Life Sciences, Zhongshan University, Guangzhou 510275, China) 


Abstract: In this study, an in silico approach was utilized to identify homologies existing between common carp 
microsatellite sequences and GenBank database using Blastn and Blastx searches. About 875 microsatellite sites with 
flanking sequences over 50bp of common carp were first compared to the zebrafish EST database. The results showed 
that 121 homologies were found using Blastn. Subsequent Blastx searches confirmed 94 sites recorded in the protein 
database. Except for 33 hypothetical proteins and three unknown proteins, seven out of 58 characterized proteins have 
been mapped to two linkage maps. In addition, two polymorphic STS markers were developed using matched zebrafish 
EST sequences by PCR-SSCP method, of which one marker HLJZe33 was mapped successfully. This study was a pilot 
for comparative studies between common carp and zebrafish, and the results demonstrated that more genetic and genomic 
resources of zebrafish can be used for the genome research of common carp. 
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The common carp (Cyprinus carpio L.) is an spp.), rainbow trout (Oncorhynchus mykiss) and channel 
important freshwater aquaculture species, widely catfish (Jctalarus punctatus), whose genome projects 


distributed from Southeast Asia to Europe and the were launched by the United States Department of 
Mediterranean region (David et al, 2001). Though it is Agriculture (USDA) in 1997. Presently, about 32103 
important for aquaculture, its genomic information is ESTs and 1871 protein sequences are available in 
limited and lags behind those of tilapia (Oreochromis GenBank  (http//www.ncbinlm.nih.gov). The first 
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genetic linkage map of common carp published 
contained only 268 DNA markers (Sun & Liang, 2004). 
Additionally, some microsatellite markers have been 
reported for this species (Crooijmans et al, 1997; Aliah et 
al, 1999; Wei et al, 2001). Most microsatellite markers 
are generally considered as type II markers due to their 
distribution in non-coding regions of the genome, some 
of these markers are in protein coding regions and have 
the connection with certain functional genes. The latter 
has been confirmed using two methods. One is that 
microsatellite markers were identified by screening 
cDNA libraries or EST sequences (Kantety et al, 2002; 
Yue & Orban, 2004; Ju et al, 2005). The other is that the 
flanking sequences of microsatellite repeats matched 
genes using a BLAST search engine. This in silico data 
mining approach has already been used to identify genes 
in mouse (Mus musculus), cattle (Bos taurus), pigs (Sus 
scrofa), chicken (Gallus gallus) and tilapia (Cnaani et al, 
2002; Herron et al, 1998; Farber & Medrano, 2003). In 
this study, we utilized a similar approach to search for 
genes using microsatellite sequences of common carp. 
This study will help annotations of these microsatellites 
and anchor more genes to linkage map. More importantly, 
it will pave the way for comparative genomic research 
between common carp and zebrafish (Zebra danio) in the 
future. 


1 Materials and methods 


1.1 Data collections 

Approximately 1000 microsatellite sequences for 
common carp were mostly developed in our laboratory, 
and a small part of them were downloaded from 
GenBank. All sequences were saved in FASTA format. 
Zebrafish db ESTs were retrieved in FASTA format from 
the GenBank database using the Entrez nucleotide query 
webpage (http://www.ncbi.nlm.nih.gov/sites/entrez?) at 
the National Center for Biotechnology Information 
(NCBI). 
1.2 Batched blast 

All repeats were masked and deleted from the target 
sequences using software programmed in the lab. 
Flanked sequences 
expressed sequence tags (EST) database of zebrafish 


(>50bp) were queried against 


using blastn searches locally. We used batched blast 
programmed by ourselves for individual searches of each 
locus. The detailed parameters for blastn were set as at 
least 11 (word size) consecutive nucleotide alignments 
and minimum sum expect value (E) of less than e-10, 
penalizing five scores for mismatches and three for gap 
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open and extension. Matched zebrafish ESTs were 
selected manually and queried against protein NR 
database using Blastx searches. The Blastx search was 
limited to at least three consecutive amino acids 
alignments and E of less than 0.001. 
13 Primers designed and PCR-SSCP 

Sequences annotated above were chosen for primer 
designing using Primer3 (Rozen & Skaletsky 2000). A 
first set of 10 primer pairs were synthesized and used for 
PCR amplification. Six primer pairs amplified target 
bands in 36 F, hybrids 
wuyuanensis XC. pellegrini pellegrini). Amplifica- 


(Cyprinus carpio vat. 
tion reactions were performed on GeneAmp 9700 
(Applied Biosystems, Inc.) in a 15 uL volume containing 
10-50ng of template, 10mmol/L Tris-HCl (pH 8.3), 50 
mmol/L KCl, 200 umol/L each dNTP, 0.1% Triton X-100, 
0.1% NP-40, 0.01% gelatin, 0.4mmol/L of each primer, 
and 1 U of Taq polymerase (Shanghai Sangon, China). 
PCR cycle parameters for all primers were: initial 
denaturation for 3min at 94°C followed by 24 cycles of 
30s at 93°C, 30s at 50-55°C, 30s at 72°C, and a final 
extension step of 10min at 72°C. The mixture of 1 pL 
PCR products and 5uL loading buffer (95% Formamide, 

0.2% Bromophenol blue, 0.2% Xylene cyanol FF) was 
denaturated for 10 min at 98°C on a thermcycler, then 
chilled quickly on ice for at least 10min. The 6 uL 
mixture was subjected to gel electrophoresis on an 8% 
non-denaturing polyacrylamide gel. After electrophoresis, 
gel was silver-stained and scanned. 


2 Results 


2.4 Genes found by Blastn and Blastx homologous 
searches 

After eliminating short flanked sequences («50bp), 
the rest of the 875 carp microsatellites with flanking 
sequences over 50bp were compared to zebrafish EST 
database using blastn searches locally, a total of 121 
homologous sequences were found. The blastx search 
confirmed 94 sequences recorded in GenBank protein 
database on Sth December, 2007. Fifty eight are 
characterized proteins, 33 hypothetical proteins and three 
unknown proteins. Hypothetical proteins and uncharact- 
erized proteins were excluded in the following analysis 
(Tab. 1). These results were compared to those of two 
different carp genetic linkage maps constructed by our 
research team (unpublished). One is a recombinant 
inbred line (RIL) map, the other is a self-crossed F map. 
Tweleve microsatellite sites denoted by HLJ in Tab. 1 
were used in genetic linkage analysis. Of three sites, 


No. 4 


Tab. 1 Genes found by blastn and blastx homologous comparisons between common carp microsatellite sequences 
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and the GenBank database 





Carp Zebra fish Identity Similarity 
clone EST Blastn (%) Matched gene (Accession No.) Blastx(%) E-value Species? 
No. (Accession No.) 
LIC_41 DN857925 61/66 (92) SORCS receptor 52/110 (47) 2e-18 GG,MM,TN 
(XP_421750) 
HLJZel BMS861022 128/137(93)  Serine/threonine-protein 134/216 (62) 4e-50 HS,RN,MM 
kinase (Q9NYY3) 
LIC_77 EB929248 50/53 (94) HnRNP methyltransferase 78/89 (87) 4e-57 TN,MM,HS 
(XP 001366105) 
LIC 65 AL928408 243/263 (92) | Down syndrome cell adhesion 112/183 (61) 4e-44 DR,TN,MM 
molecule (XP. 693739) 
LID 53 EB885308 40/41 (97) Capping protein 110/113 (97) Te-59 DR,MM 
(NP 956229) 
HLJ174 EB777153 93/102 (91) General transcription 63/63 (100) le-30 DR,MM,HS 
factor (XP. 001345566) 
HLJ321 CF997715 114/122 (93) | Gamma-dystrobrevin 233/235 (99) 3e-139 DR,MM,RN 
(ABF55376 ) 
HLJ721 CN319735 111/115 (96) Na+/KĶ+ ATPase 204/204 (100) le-116 DR,MM,HS 
(AAK33032) 
LIF_46 DN897423 39/40 (97) Pre-B cell enhancing 85/91 (93) 2e-18 DR,TN 
factor (XP. 686052) 
LIF_72 EB905145 60/65 (92) Transposase 64/107 (59) 9e-29 DR, OM 
(XP. 001338227) 
LIH 63 CN171307 191/208 (91)  Transposase 44/90 (48) le-26 DR.OM 
(XP 001339494) 
LII-78 CT703965 75/81 (92) Transposase 110/271 (40) 8e-42 PP,OM,DR 
(CAB51371) 
LIK_08 CT664262 35/35 (100) Transposase 57/86 (66) 9e-51 PP,OM,DR 
(CAB51372) 
LIF_90 DN597649 55/60 (91) Solute carrier family 198/207 (95) 9e-82 DR,TN,OM 
39 member (NP. 997748) 
LIF-30 EB937013 65/71 (91) Talin 1 191/191 (100) 4e-101 DR,RN 
(NP 001009560) 
LIF-36 CT695338 49/52 (94) Alpha-actinin 23/24 (95) 0.001 DR 
(XP 001345739) 
LIF-57 EE303946 60/61 (98) MGC83562 protein 52/126 (41) 2e-24 GG,XL 
(XP 421214) 
LIF-81 AL925325 65/69 (94) VAV-3 protein 61/108 (56) 3e-20 OA,MM,RN 
(XP 001515846 ) 
LIF-20 EB900829 45/47 (95) Protein tyrosine phosphatases 25/25 (100) 8e-08 DR,TN,HS 
epsilon (XP. 695831) 
HLJ354 CT635692 97/103 (94) Activin receptor IIB 134/134 (100) 2e-80 DR,HS,GG 
(XP 697649) 
HLJ356 CN014012 128/145 (88) T-cell immune regulator 248/260 (95) le-122 DR,TN,RN 
(NP 998234) 
HLJ870 EE325778 91/95 (95) Phenylalanyl-tRNA synthetase 222/222 (100) 8e-132 DR,HS,MM 
(AAI28806) 
LII 01 CT609085 49/53 (92) Sorting nexin 13 23/38 (60) 3e-05 TR,GG,MM 
(AAO15004) 
HLJ376 DY556876 89/91 (97) Actin-related protein 212/212 (100) 2e-121 DR,TN,MM 
(NP 001003944) 
HLJ380 EE716632 37/38 (97) Diaphorase (NADH) 230/233 (98) 2e-123.  DR,TN 
(NP 956483) 
LII 47 EG580779 62/65 (95) 136342 protein 240/242 (99) 8e-149 DR.MD,GG 
(AAI46619) 


(to be continued) 
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(continued) 
Carp Zebra fish Identity Similarity 
clone EST Blastn (%) Matched gene (Accession No.) Blastx(%) E-value Species* 
No. (Accession No.) 
HLJ390 EB852635 85/90 (94) DNA binding protein 154/154 (100) le-72 DR,HS,MM 
(XP 696090) 
LII-82 EB885308 40/41 (97) Capping protein 110/113 (97) Te-59 DR,TN,HS 
(NP 956229) 
LIK 06 CT676279 84/89 (94) Alpha-actinin 190/190(100) 9e-107 DR,TN 
(AAN77132) 
LIK_56 EG571207 71/80 (96) VAMP -associated protein 172/173 (99) le-115  DR,HS,RN 
(NP 001002546) 
LIK 70 CT602809 35/35 (100) Oviductin 23/26 (88) 0.005 MD 
(XP 001377953) 
LIK. 80 EG573763 35/36 (97) G protein pathway suppressor 142/145 (97) le-72 RN,DR,MM 
(EDM06902) 
LIK 82 CT609085 49/53 (92) Sorting nexin 13 23/38 (60) 3e-05 TR,GG,MM 
(AAO15004) 
HLJ418 EB927888 99/110 (90) Damage-specific DNA binding 182/216 (84) 8e-100 DR,RN,TN 
protein (EDM12828) 
LIL-14 CN014012 132/149 (88) T-cell immune regulator 248/260 (95) le-122  DR,TN,GG 
(NP 998234) 
LIL-19-2 CK684754 37/38 (97) ORF2-encoded protein 43/50 (86) 6e-16 DR, PO 
(BAE46429) 
HLJZe33 CK691226 71/76 (93) Meningioma 117/183 (63) 2e-45 XT,TN,RN 
(NP 001093672) 
LIL-63 CK146298 59/62 (95) Nrplb protein 107/109 (98) 3e-59 DR,MAM 
(AAI33731) 
LIL-73 EE699974 91/102 (89) Fibroblast growth factor 138/193 (71) 9e-70 DR,XT,HS 
(NP 001093754) 
LIL-91 EB952613 61/66 (92) DnaJ (Hsp40) homolog 126/126 (100) 4e-68 DR,MD 
(NP 955956) 
LIL-96 DV590107 76/85 (89) Dynactin 3 186/187 (99) 4e-99 DR,TN,XL 
(NP 001002220) 
LIM 18 EB931517 46/49 (93) Triple functional domain 132/145 (91) 2e-60 DR,TN,HS 
(NP 001097996) 
LIM 64 DY554324 122/135 (90) Protein tyrosine phosphatase 174/233 (74) 2e-72 DR,XL,HS 
(NP 956140 ) 
LIM-46 EB961560 95/103 (92) Ubiquitin protein ligase E3 90/90 (100) 2e-29 DR,MM,HS 
(AAH81553) 
LIN-15 DN597286 94/102 (92) Mitochondrial peptide chain 70/126 (55) 9e-37 PF,TN,RN 
release factor (CAC24560) 
LIN-60 CN505827 50/52 (96) Sema domain 256/259 (98) 7e-130 DR,MD,HS 
(NP 998164 ) 
HLJ779 EE718464 35/36 (97) ATP-binding domain 129/181 (71) 4e-89 RN,DR 
(NP 001014203) 
LIN-72 DT060464 104/114 (91) RNA splicing factor 97/143 (67) 2e-38 DR,RN,MM 
(NP 596908 ) 
LIN-76 EB946165 137/139 (98) — Testican-2 94/94 (100) 8e-55 DR,TN,XL 
(XP 690238) 
LIN-91 CK396800 144/159 (90) ROD! protein 131/131 (100) 2e-68 DR,RN 
(XP 001335967 ) 
LIO-11 BM889554 52/54 (96) Osmotic stress transcription 44/61 (72) 5e-12 TN,MD 
factor (AAT84345) 
LIP-C6 EE699974 91/102 (89) Fibroblast growth factor 138/193 (71) 9e-70 XT,DR,PT 


(NP 001093754) 


(to be continued) 
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(continued) 
Carp clone Zebra fish Identity Matched gene (Accession No.) Similarity Species? 
No. EST Blastn (%) Blastx(%) E-value 
(Accession No.) 

LIP-E07 EE303946 60/61 (98) MGC83562 protein 52/126 (41) 2e-24 GG,XL 
(XP 421214) 

LIP-H21 AL911930 118/128 (92) 66097 protein 59/59 (100) 6e-27 DR, TN 
(AAH74055) 

LIP-H07 DN597649 55/60 (91) Solute carrier family 39 198/207(95) 9e-82 DR,TR 
(NP 997748) 

LIP-H11 DV585441 44/46 (95) Acetylglucosaminyltransferase 67/118 (56) le-32 MAM,DR,HS 
(XP 001114281) 

LIQ 48 CK704847 36/37 (97) Caveolin 3 150/150 (100) 7e-85 DR,TN,GG 
(NP 991301) 

HLJ526 EB912091 112/126 (88) Vertebrate t-complex testis 101/104 (97) 4e-52 DR, TN 


expressed 1 (CAI12013) 





"The names of species were abbreviated as follows: DR, Danio rerio; GG, Gallus gallus; HS, Homo sapiens; MM, Mus musculus; RN, 
Rattus norvegicus; MD, Monodelphis domestica; OA, Ovis aries; PO, Paralichthys olivaceus; XL, Xenopus laevis; TN, Tetraodon 
nigroviridis; OM, Oncorhynchus mykiss; PP, Pleuronectes platessa; XT, Xenopus tropicalis; MAM, Macaca mulatta; PF, Platichthys flesus; 


PT, Pan troglodytes. 
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Fig. 1 


HLJ380(LG40), HLJ418(LG19) and HLJ870(LG4) were 
linked to RIL map, four sites, HLJ321(LG6), 
HLJ376(LG8), HLJ526(LG14) and HLJ779(LG8) were 
anchored to self-crossed map. 
2.2 Two new sequence tag sites (STS) developed by 
PCR-SSCP method 

Ten primer pairs were redesigned from matched 
zebrafish EST sequences according to the primer 
designing criteria. As a result, six primer sets amplified 
target bands clearly, but only two primers, HLJZel and 
HLJZe33, had polymorphisms in 36 F, individuals (Fig. 
1). Subsequently, two STS markers were used for linkage 
analysis, and only HLJZe33 were mapped successfully 
(Fig. 2). 


3 Discussion 


According to our findings in microsatellite develop- 
ment and application, only 10965-2096 of microsatellites 
were useful in population genetics or other research 
fields. Most microsatellite sequences were abandoned 
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Polymorphisms of two STS markers in 36 F, hybrids, each lane stands for one individual 
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Fig.2 One STS marker HLJZe33 was mapped to linkage 
10 of the self-crossed F; map 


due to failures in primer design or PCR amplification. 
However, flanking sequences of microsatellites usually 
are highly conserved in one species and its close relatives. 
Those conserved sequences might lie in coding regions. 
Sequences of coding regions are more conserved than 
non-coding regions and thus are better for relating genes 
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between different species (Ju et al, 2005). Common carp 
belongs to the same Cypriniae Family as zebrafish, 
which has been developed as a genetics model organism 
and accumulated abundant genetic data. It is greatly 
advantageous and significantly meaningful to apply 
zebrafish genomic and genetic data to the genome 
research of the common carp. In fact, the first genetic 
linkage map of common carp contained 65 microsatellit- 
es of zebrafish (Sun & Liang, 2004). Quan et al (2006) 
to detect 
polymorphisms in C. c haematopterus, C. c vat. 


utilized 6072 zebrafish  mircrosatellites 


wuyuanensis and C. pellegrini pellegrini. As a result, 646 
amplified target bands (9%), of 563 were polymorphic 
and useful in three different carp. In our study, by 
comparing flanking sequences of common carp to 
zebrafish dbESTs, 121 homologous genes (7%) were 
found with high identities (78596). These results 
demonstrated that the genetic and genomic resources of 
zebrafish can be used in the genetic and genomic 
research of the common carp. 

Using Blastx search, 58 sites were annotated and 
established associations with some functional genes, and 
eight of them were anchored to linkage maps (seven 
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